Skip to content

Proposed "snappy" vs "x-snappy-framed" content-type confusion #12825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

skandragon
Copy link
Contributor

@skandragon skandragon commented Apr 11, 2025

Description

The collector http compression and decompression logic uses "snappy" incorrectly. The content-type "snappy" should be the block ("non-framed") version, while "x-snappy-framed" should be the framed one. Unfortunately, the collector code used framed compression and decompression and called it "snappy."

This corrects this in two ways:

  • Add x-snappy-framed as a content-type for both incoming and outgoing compression.
  • Add a (hopefully temporary) hack that will, when decompressing content-type "snappy", peek into the data, and see if the 10 byte snappy-framed header is present. If it is, uncompress it using framed, otherwise use the block decompression.

In contrast to other proposals in the linked issue, this feels like a clean path forward as warnings about using "snappy" in the docs can be added, and people can either not change (in which case it will just start working using the block format) or change, in which case it will use framed.

Link to tracking issue

Fixes #10584 by providing a way forward

Testing

  • Test correct and incorrect configuration of both "snappy" and "x-snappy-framed" content-types.
  • Add a test for content-type "snappy" but framed data.

@skandragon skandragon requested a review from a team as a code owner April 11, 2025 06:52
Copy link

codecov bot commented Apr 11, 2025

Codecov Report

Attention: Patch coverage is 80.43478% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.42%. Comparing base (568c3ce) to head (db48f79).
Report is 104 commits behind head on main.

Files with missing lines Patch % Lines
config/confighttp/compression.go 76.00% 4 Missing and 2 partials ⚠️
config/confighttp/compressor.go 85.00% 2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (80.43%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12825      +/-   ##
==========================================
- Coverage   91.44%   91.42%   -0.02%     
==========================================
  Files         487      487              
  Lines       26848    26892      +44     
==========================================
+ Hits        24551    24586      +35     
- Misses       1814     1820       +6     
- Partials      483      486       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dmathieu
Copy link
Member

Please fix the lint issues and add a changelog entry.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to what is the "authority" that defines the "correct" usage? I am not sure I understand who defines corectness here.

the "block" compression variant of "snappy". This change allows the collector to:
- The server endpoints will now accept "x-snappy-framed" as a valid
content-encoding.
- Client compression type "snappy" will now compress to the "block" variant of snappy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do this, since as @jpkrohling explained, this will break existing servers that are not updated.

First we need to add support for both on both server/client. The "hack" for the server can also go now. Then wait for few versions until you can change the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific behavior do we want here? I tried to avoid breaking receiving as right now that felt like the pain point. Should we add x-snappy-framed as a type first, and then at some point later switch the compression to use non-framed for "snappy"?

If I changed the PR to do this, would this satisfy the path forward and start the ball rolling?

Copy link
Contributor

@jade-guiton-dd jade-guiton-dd Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR avoids breakage in the situation where the receiver is updated but not the sender (and that's great!), but not in the opposite scenario: if the sender is updated but not the receiver, the sender will switch to block mode, and the receiver will not accept it. Of course, the sender can "just" update their config to explicitly switch back to framed mode, but this could still be a source of breakage.

Bogdan's suggestion is to add the protocol detection in the receiver (and the explicit framed mode) now, but to wait a few versions before switching "snappy" to use the block mode. This means that breakage will only occur if the sender gets both updates and the receiver gets none, which is less likely.

I think one additional possibility, to ease the transition and minimize how much code change needs to be postponed, would be to put the switch of snappy from framed to block mode behind an Alpha (off by default) feature gate, and add a warning log in confighttp when "snappy" is used with the gate disabled. Then later, we will only need to switch it to Beta (this is where the breaking change would be). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion. For the components that need to use snappy encoding/decoding correctly, we just need to document that the new feature gate is required for them to work.

I don't see anybody else requiring this besides the prometheus remote-write receiver, which is not even an alpha component yet, so it sounds reasonable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with all the comments.

# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not enhancement, this is a complete breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it as enhancement if we switch the strategy to using feature flags?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would no longer be a breaking change if we delay the client changes (unless there are bugs in the protocol detection).

@skandragon
Copy link
Contributor Author

Please link to what is the "authority" that defines the "correct" usage? I am not sure I understand who defines corectness here.

There isn't an authority, except that the common usage is "snappy means block format" and "x-snappy-framed" means the streaming friendly framed format. The only real authority I could find is that in the spec for the framed format, it specifically defines this:

2. File type identification

The following identifiers for this format are recommended where appropriate.
However, note that none have been registered officially, so this is only to
be taken as a guideline. We use "Snappy framed" to distinguish between this
format and raw Snappy data.

  File extension:         .sz
  MIME type:              application/x-snappy-framed
  HTTP Content-Encoding:  x-snappy-framed

(see https://github.com/google/snappy/blob/main/framing_format.txt)

@ArthurSens
Copy link
Member

Hi @skandragon , thanks for working on this! I'm one of the codeowner of Prometheus Remote Write Receiver 👋

@perebaj and I were working on getting the receiver to alpha stage until we got blocked for several weeks due to prw requests failing to be decoded via httpconfig(open-telemetry/opentelemetry-collector-contrib#38899), until I randomly bumped into your PR today.

Now that I've read your PR and the linked issue, things are making sense! One suggestion to make sure your PR works for the remote-write receiver is trying to build the collector-contrib, but updating the collector dependency to your commit SHA and verifying that it works. See open-telemetry/opentelemetry-collector-contrib#38899 (comment) for instructions.

I'll try to find some time next week to help out here

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 -- I've just tested the code changes in open-telemetry/opentelemetry-collector-contrib#39446, it works :)

I agree that it's not that simple to just change things like has been said already. How do you feel about the suggestion of using feature flags instead @skandragon? We could abandon the logic that detects the format by reading the first bytes of the sequence, which also helps with code readability in my opinion 🤔

# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it as enhancement if we switch the strategy to using feature flags?

@jade-guiton-dd
Copy link
Contributor

We could abandon the logic that detects the format by reading the first bytes of the sequence, which also helps with code readability in my opinion 🤔

I think removing the protocol detection logic would cause issues once we set the feature flag to beta; you wouldn't be able to update the client before the server.

@bogdandrutu
Copy link
Member

@jade-guiton-dd @skandragon @ArthurSens does this apply to gRPC as well?

@jade-guiton-dd
Copy link
Contributor

It looks like configgrpc uses the "snappy" gRPC compressor from https://github.com/mostynb/go-grpc-compression, which defers to https://github.com/golang/snappy. Not sure which version of the protocol this is implementing though.

It seems gRPC doesn't use the Content-Encoding header, and has its own compression header grpc-encoding, so I'm not sure if there are established conventions for that.

Copy link
Contributor

@perebaj perebaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skandragon
Copy link
Contributor Author

Hi @skandragon , thanks for working on this! I'm one of the codeowner of Prometheus Remote Write Receiver 👋

Howdy!

Perhaps we should chat a bit. I want to use the remote write receiver as part of our managed collector framework (cardinalhq.io for reference) where we want to make it easy to migrate from a prometheus shop to an Otel data flow. We are adding scraping configs, but that seems less useful than just getting the firehose from prometheus directly.

Let's chat and see if we can help develop the receiver.

@ArthurSens
Copy link
Member

That would be great!

We're currently developing it as part of a mentorship, @perebaj is doing most of the heavy work here :)
We discuss the development of all Prometheus-related components in CNCF's slack, specifically in the #otel-prometheus channel. Come say hi 👋

@perebaj
Copy link
Contributor

perebaj commented Apr 28, 2025

Hi @skandragon , thanks for working on this! I'm one of the codeowner of Prometheus Remote Write Receiver 👋

Howdy!

Perhaps we should chat a bit. I want to use the remote write receiver as part of our managed collector framework (cardinalhq.io for reference) where we want to make it easy to migrate from a prometheus shop to an Otel data flow. We are adding scraping configs, but that seems less useful than just getting the firehose from prometheus directly.

Let's chat and see if we can help develop the receiver.

You can also ping me on slack/git if any doubt arises. I hope that in the next weeks we will have the prw2 working...

github-merge-queue bot pushed a commit that referenced this pull request May 5, 2025
#### Description
This is an alternative PR to #12825.

I'm taking all the commits from that PR and adding the feature gate on
the client side, as requested by reviews. The server behavior of peeking
into the initial bytes to identify the encoding is kept :)

If you've reviewed the previous PR, you can just review the latest
commit!

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes #10584

---------

Signed-off-by: Arthur Silva Sens <[email protected]>
Co-authored-by: Michael Graff <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Co-authored-by: Pablo Baeyens <[email protected]>
Co-authored-by: Jonathan <[email protected]>
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 13, 2025
@jade-guiton-dd
Copy link
Contributor

Since #12911 was merged, I think this can be closed.

TimoBehrendt pushed a commit to TimoBehrendt/tracebasedlogsampler that referenced this pull request May 20, 2025
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [go.opentelemetry.io/collector/component](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/component/componenttest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |
| [go.opentelemetry.io/collector/confmap](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/consumer](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/consumer/consumertest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |
| [go.opentelemetry.io/collector/pdata](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/processor](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/processor/processortest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |

---

### Release Notes

<details>
<summary>open-telemetry/opentelemetry-collector (go.opentelemetry.io/collector/component)</summary>

### [`v1.32.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1320v01260)

##### 🛑 Breaking changes 🛑

-   `configauth`: Removes deprecated `configauth.Authentication` and `extensionauthtest.NewErrorClient` ([#&#8203;12992](open-telemetry/opentelemetry-collector#12992))
    The following have been removed:
    -   `configauth.Authentication` use `configauth.Config` instead
    -   `extensionauthtest.NewErrorClient` use `extensionauthtest.NewErr` instead

##### 💡 Enhancements 💡

-   `service`: Replace `go.opentelemetry.io/collector/semconv` usage with `go.opentelemetry.io/otel/semconv` ([#&#8203;12991](open-telemetry/opentelemetry-collector#12991))
-   `confmap`: Update the behavior of the confmap.enableMergeAppendOption feature gate to merge only component lists. ([#&#8203;12926](open-telemetry/opentelemetry-collector#12926))
-   `service`: Add item count metrics defined in Pipeline Component Telemetry RFC ([#&#8203;12812](open-telemetry/opentelemetry-collector#12812))
    See [Pipeline Component Telemetry RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) for more details:
    -   `otelcol.receiver.produced.items`
    -   `otelcol.processor.consumed.items`
    -   `otelcol.processor.produced.items`
    -   `otelcol.connector.consumed.items`
    -   `otelcol.connector.produced.items`
    -   `otelcol.exporter.consumed.items`
-   `tls`: Add trusted platform module (TPM) support to TLS authentication. ([#&#8203;12801](open-telemetry/opentelemetry-collector#12801))
    Now the TLS allows the use of TPM for loading private keys (e.g. in TSS2 format).

##### 🧰 Bug fixes 🧰

-   `exporterhelper`: Add validation error for batch config if min_size is greater than queue_size. ([#&#8203;12948](open-telemetry/opentelemetry-collector#12948))

-   `telemetry`: Allocate less memory per component when OTLP exporting of logs is disabled ([#&#8203;13014](open-telemetry/opentelemetry-collector#13014))

-   `confmap`: Use reflect.DeepEqual to avoid panic when confmap.enableMergeAppendOption feature gate is enabled. ([#&#8203;12932](open-telemetry/opentelemetry-collector#12932))

-   `internal telemetry`: Add resource attributes from telemetry.resource to the logger ([#&#8203;12582](open-telemetry/opentelemetry-collector#12582))
    Resource attributes from telemetry.resource were not added to the internal
    console logs.

    Now, they are added to the logger as part of the "resource" field.

-   `confighttp and configcompression`: Fix handling of `snappy` content-encoding in a backwards-compatible way ([#&#8203;10584](open-telemetry/opentelemetry-collector#10584), [#&#8203;12825](open-telemetry/opentelemetry-collector#12825))
    The collector used the Snappy compression type of "framed" to handle the HTTP
    content-encoding "snappy".  However, this encoding is typically used to indicate
    the "block" compression variant of "snappy".  This change allows the collector to:
    -   When receiving a request with encoding 'snappy', the server endpoints will peek
        at the first bytes of the payload to determine if it is "framed" or "block" snappy,
        and will decompress accordingly.  This is a backwards-compatible change.
    If the feature-gate "confighttp.framedSnappy" is enabled, you'll see new behavior for both client and server:
    -   Client compression type "snappy" will now compress to the "block" variant of snappy
        instead of "framed". Client compression type "x-snappy-framed" will now compress to the "framed" variant of snappy.
    -   Servers will accept both "snappy" and "x-snappy-framed" as valid content-encodings.

-   `tlsconfig`: Disable TPM tests on MacOS/Darwin ([#&#8203;12964](open-telemetry/opentelemetry-collector#12964))

<!-- previous-version -->

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yNjMuMSIsInVwZGF0ZWRJblZlciI6IjM5LjI2My4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

Reviewed-on: https://gitea.t000-n.de/t.behrendt/tracebasedlogsampler/pulls/13
Co-authored-by: Renovate Bot <[email protected]>
Co-committed-by: Renovate Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[confighttp] snappy compress, Encode and NewReader inconsistent behavior
6 participants